Polynomial and rational function modeling

In statistical modeling (especially process modeling), polynomial functions and rational functions are sometimes used as an empirical technique for curve fitting.

1 Polynomial function models
- 1.1 Advantages
- 1.2 Disadvantages
2 Rational function models
- 2.1 Advantages
- 2.2 Disadvantages
3 See also
4 Bibliography
- 4.1 Historical
5 External links

Polynomial function models

Main article: polynomial interpolation

A polynomial function is one that has the form

$y = a_{n}x^{n} %2B a_{n-1}x^{n-1} %2B \cdots %2B a_{2}x^{2} %2B a_{1}x %2B a_{0}$

where n is a non-negative integer that defines the degree of the polynomial. A polynomial with a degree of 0 is simply a constant function; with a degree of 1 is a line; with a degree of 2 is a quadratic; with a degree of 3 is a cubic, and so on.

Historically, polynomial models are among the most frequently used empirical models for curve fitting.

Advantages

These models are popular for the following reasons.

Polynomial models have a simple form.
Polynomial models have well known and understood properties.
Polynomial models have moderate flexibility of shapes.
Polynomial models are a closed family. Changes of location and scale in the raw data result in a polynomial model being mapped to a polynomial model. That is, polynomial models are not dependent on the underlying metric.
Polynomial models are computationally easy to use.

Disadvantages

However, polynomial models also have the following limitations.

Polynomial models have poor interpolatory properties. High-degree polynomials are notorious for oscillations between exact-fit values.
Polynomial models have poor extrapolatory properties. Polynomials may provide good fits within the range of data, but they will frequently deteriorate rapidly outside the range of the data.
Polynomial models have poor asymptotic properties. By their nature, polynomials have a finite response for finite x values and have an infinite response if and only if the x value is infinite. Thus polynomials may not model asymptotic phenomena very well.
While no procedure is immune to the bias-variance tradeoff, polynomial models exhibit a particularly poor tradeoff between shape and degree. In order to model data with a complicated structure, the degree of the model must be high, indicating that the associated number of parameters to be estimated will also be high. This can result in highly unstable models.

When modeling via polynomial functions is inadequate due to any of the limitations above, the use of rational functions for modeling may give a better fit.

Rational function models

A rational function is simply the ratio of two polynomial functions.

$y = \frac{a_{n}x^{n} %2B a_{n-1}x^{n-1} %2B \ldots %2B a_{2}x^{2} %2B a_{1}x %2B a_{0}} {b_{m}x^{m} %2B b_{m-1}x^{m-1} %2B \ldots %2B b_{2}x^{2} %2B b_{1}x %2B b_{0}}$

with n denoting a non-negative integer that defines the degree of the numerator and m is a non-negative integer that defines the degree of the denominator. For fitting rational function models, the constant term in the denominator is usually set to 1. Rational functions are typically identified by the degrees of the numerator and denominator. For example, a quadratic for the numerator and a cubic for the denominator is identified as a quadratic/cubic rational function. A rational function model is a generalization of the polynomial model: rational function models contain polynomial models as a subset (i.e., the case when the denominator is a constant).

Advantages

Rational function models have the following advantages:

Rational function models have a moderately simple form.
Rational function models are a closed family. As with polynomial models, this means that rational function models are not dependent on the underlying metric.
Rational function models can take on an extremely wide range of shapes, accommodating a much wider range of shapes than does the polynomial family.
Rational function models have better interpolatory properties than polynomial models. Rational functions are typically smoother and less oscillatory than polynomial models.
Rational functions have excellent extrapolatory powers. Rational functions can typically be tailored to model the function not only within the domain of the data, but also so as to be in agreement with theoretical/asymptotic behavior outside the domain of interest.
Rational function models have excellent asymptotic properties. Rational functions can be either finite or infinite for finite values, or finite or infinite for infinite x values. Thus, rational functions can easily be incorporated into a rational function model.
Rational function models can often be used to model complicated structure with a fairly low degree in both the numerator and denominator. This in turn means that fewer coefficients will be required compared to the polynomial model.
Rational function models are moderately easy to handle computationally. Although they are nonlinear models, rational function models are particularly easy nonlinear models to fit.

Disadvantages

Rational function models have the following disadvantages:

The properties of the rational function family are not as well known to engineers and scientists as are those of the polynomial family. The literature on the rational function family is also more limited. Because the properties of the family are often not well understood, it can be difficult to answer the following modeling question: Given that data has a certain shape, what values should be chosen for the degree of the numerator and the degree on the denominator?
Unconstrained rational function fitting can, at times, result in undesired vertical asymptotes due to roots in the denominator polynomial. The range of x values affected by the function "blowing up" may be quite narrow, but such asymptotes, when they occur, are a nuisance for local interpolation in the neighborhood of the asymptote point. These asymptotes are easy to detect by a simple plot of the fitted function over the range of the data. These nuisance asymptotes occur occasionally and unpredictably, but practitioners argue that the gain in flexibility of shapes is well worth the chance that they may occur, and that such asymptotes should not discourage choosing rational function models for empirical modeling.

One common difficulty in fitting nonlinear models is finding adequate starting values. A major advantage of rational function models is the ability to compute starting values using a linear least squares fit. To do this, p points are chosen from the data set, with p denoting the number of parameters in the rational model. For example, given the linear/quadratic model

$y=\frac{A_0 %2B A_1x} {1 %2B B_1x %2B B_2x^{2}}$

one would need to select four representative points, and perform a linear fit on the model

$y = A_0 %2B A_1x %2B \ldots %2B A_{p_n}x^{p_n} - B_1xy - \ldots - B_{p_d}x^{p_d}y$

Here, p_n and p_d are the degrees of the numerator and denominator, respectively, and the x and y contain the subset of points, not the full data set. The estimated coefficients from this linear fit are used as the starting values for fitting the nonlinear model to the full data set.

Note: This type of fit, with the response variable appearing on both sides of the function, should only be used to obtain starting values for the nonlinear fit. The statistical properties of fits like this are not well understood.

The subset of points should be selected over the range of the data. It is not critical which points are selected, although obvious outliers should be avoided.

Bibliography

{{cite book |author=Atkinson, A. C. and Donev, A. N. and Tobias, R. D.|title=Optimum Experimental Designs, with [[SAS System|SAS]] |url=http://books.google.se/books?id=oIHsrw6NBmoC|publisher=Oxford University Press|year=2007 |pages=511+xvi |isbn=978-0-19-929660-6 |oclc= |doi=}}
Box, G. E. P. and Draper, Norman. 2007. Response Surfaces, Mixtures, and Ridge Analyses, Second Edition [of Empirical Model-Building and Response Surfaces, 1987], Wiley.
Kiefer, Jack Carl. (1985). L. D. Brown et al.. ed. Jack Carl Kiefer Collected Papers III Design of Experiments. Springer-Verlag. ISBN 0-387-96004-X.
R. H. Hardin and N. J. A. Sloane, "A New Approach to the Construction of Optimal Designs", Journal of Statistical Planning and Inference, vol. 37, 1993, pp. 339-369
R. H. Hardin and N. J. A. Sloane, "Computer-Generated Minimal (and Larger) Response Surface Designs: (I) The Sphere"
R. H. Hardin and N. J. A. Sloane, "Computer-Generated Minimal (and Larger) Response Surface Designs: (II) The Cube"
Ghosh, S. and Rao, C. R., ed (1996). Design and Analysis of Experiments. Handbook of Statistics. 13. North-Holland. ISBN 0-444-82061-2.
- Draper, Norman and Lin, Dennis K. J.. "Response Surface Designs". pp. 343–375.
- Gaffke, N. and Heiligers, B. "Approximate Designs for Polynomial Regression: Invariance, Admissibility, and Optimality". pp. 1149–1199.
Melas, Viatcheslav B. (2006). Functional Approach to Optimal Experimental Design. Lecture Notes in Statistics. 184. Springer-Verlag. ISBN 038798741X. (Modeling with rational functions)

Historical

Gergonne, J. D. (1815). "Application de la méthode des moindre quarrés a l'interpolation des suites". Annales de mathématiques pures et appliquées 6: 242–252.
Gergonne, J. D. (1974 [1815]). "The application of the method of least squares to the interpolation of sequences". Historia Mathematica 1 (4): 439–447. doi:10.1016/0315-0860(74)90034-2. http://www.sciencedirect.com/science/article/B6WG9-4D7JMHH-20/2/df451ec5fbb7c044d0f4d900af80ec86.
Stigler, Stephen M. (1974). "Gergonne's 1815 paper on the design and analysis of polynomial regression experiments". Historia Mathematica 1 (4): 431–439. doi:10.1016/0315-0860(74)90033-0. http://www.sciencedirect.com/science/article/B6WG9-4D7JMHH-1Y/2/680c7ada0198761e9866197d53512ab4.
Smith, Kirstine (1918). "On the Standard Deviations of Adjusted and Interpolated Values of an Observed Polynomial Function and its Constants and the Guidance They Give Towards a Proper Choice of the Distribution of the Observations". Biometrika 12 (1/2): 1–85. JSTOR 2331929.

External links

Rational Function Models

Least squares and regression analysis

Computational statistics

Least squares · Linear least squares · Non-linear least squares · Iteratively reweighted least squares

Correlation and dependence

Pearson product-moment correlation · Rank correlation (Spearman's rho, Kendall's tau) · Partial correlation · Confounding variable

Regression analysis

Ordinary least squares · Partial least squares · Total least squares · Ridge regression

Regression as a
statistical model

Linear regression	Simple linear regression · Ordinary least squares · Generalized least squares · Weighted least squares · General linear model

Predictor structure	Polynomial regression · Growth curve · Segmented regression · Local regression

Non-standard	Nonlinear regression · Nonparametric · Semiparametric · Robust · Quantile · Isotonic

Non-normal errors	Generalized linear model · Binomial · Poisson · Logistic

Decomposition of variance

Analysis of variance · Analysis of covariance · Multivariate AOV

Model exploration

Mallows' Cp · Stepwise regression · Model selection · Regression model validation

Background

Mean and predicted response · Gauss–Markov theorem · Errors and residuals · Goodness of fit · Studentized residual · Minimum mean-square error

Design of experiments

Response surface methodology · Optimal design · Bayesian design

Numerical approximation

Numerical analysis · Approximation theory · Numerical integration · Gaussian quadrature · Orthogonal polynomials · Chebyshev polynomials · Chebyshev nodes

Applications

Curve fitting · Calibration curve · Numerical smoothing and differentiation · System identification · Moving least squares

Regression analysis category - Statistics category · Statistics portal · Statistics outline · Statistics topics

Statistics

Descriptive statistics

Continuous data

Location	Mean (Arithmetic, Geometric, Harmonic) Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Dependence

Statistical graphics

Data collection

Designing studies	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling Stratified sampling Opinion poll Questionnaire

Controlled experiment	Design of experiments Randomized experiment Random assignment Replication Blocking Factorial experiment Optimal design

Uncontrolled studies	Natural experiment Quasi-experiment Observational study

Statistical inference

Statistical theory	Sampling distribution Sufficient statistic Meta-analysis

Bayesian inference	Bayesian probability Prior Posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator

Frequentist inference	Confidence interval Hypothesis testing Likelihood-ratio

Specific tests	Z-test (normal) Student's t-test F-test Pearson's chi-squared test Wald test Mann–Whitney U Shapiro–Wilk Signed-rank Kolmogorov–Smirnov test

General estimation	Bias Robustness Efficiency Maximum likelihood Method of moments Minimum distance Density estimation

Correlation and regression analysis

Correlation	Pearson product-moment correlation Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust

Generalized linear model	Exponential families Logistic (Bernoulli) Binomial Poisson

Partition of variance	Analysis of variance (ANOVA) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data	Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test

Multivariate statistics	Multivariate regression Principal components Factor analysis Cluster analysis Copulas

Time series analysis	Decomposition (Trend, Stationary process) ARMA model ARIMA model Vector autoregression Spectral density estimation

Survival analysis	Survival function Kaplan–Meier Logrank test Failure rate Proportional hazards models Accelerated failure time model

Applications

Biostatistics	Bioinformatics Biometrics Clinical trials & studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process & Quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Outline
Index

This article incorporates public domain material from websites or documents of the National Institute of Standards and Technology.

Polynomial and rational function modeling

Contents

Polynomial function models

Advantages

Disadvantages

Rational function models

Advantages

Disadvantages

See also

Bibliography

Historical

External links